chore: Modernize the MongoDB Atlas Mixin #1544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

schmikei wants to merge 2 commits into grafana:master from schmikei:chore/mongodbatlas-modernization-effort

Contributor

schmikei commented Nov 17, 2025 •

edited

Loading

Some of the screenshots are missing data mostly due to me not setting up sharding, but queries/functionality should be pretty similar to the original.

MongoDB Atlas cluster overview

Paginated the tables 🚀

MongoDB Atlas elections overview

MongoDB Atlas operations overview

MongoDB Atlas performance overview

MongoDB Atlas sharding overview


          modernize the mongodbatlas mixin

e64ded5

schmikei mentioned this pull request

chore: Modernize mongodb-atlas-mixin #1530

Closed


          testing fixes/aggregate average catch-up operatiosn

e63ac4d

schmikei commented

View reviewed changes

mongodb-atlas-mixin/prometheus_rules_out/prometheus_alerts.yaml

    
                      - alert: MongoDBAtlasElectionTimeouts

                        annotations:

                          description: The number of elections being called due to the primary node timing out in replica set {{$labels.rs_m}} in cluster {{$labels.cl_name}} is {{printf "%.0f" $value}} which is above the threshold of 10.

                          description: The number of elections being called due to the primary node timing out in replica set {{$labels.rs_nm}} in cluster {{$labels.cl_name}} is {{printf "%.0f" $value}} which is above the threshold of 10.

Contributor Author

schmikei Nov 17, 2025

Identified a typo here

schmikei marked this pull request as ready for review

November 18, 2025 19:08

schmikei requested a review from a team as a code owner

November 18, 2025 19:08

schmikei added the monitoring-mixins label

Dasomeone reviewed

View reviewed changes

Member

Dasomeone left a comment

Have to leave it here as it's end of day, but first pass review sort of done.

Generally, layout is A+ and I'm perfectly happy with it.
Couple suggestions for improvements in terms of legend tabels and filtering, and I have yet to do a pass on the usage of common-lib so no comments there yet

mongodb-atlas-mixin/config.libsonnet

    
                  dashboardTimezone: 'default',

                  dashboardRefresh: '1m',

                // Basic filtering - MongoDB Atlas uses job and cl_name (cluster name) as primary filters

                filteringSelector: 'job="integrations/mongodb-atlas"',

Member

Dasomeone Dec 17, 2025

As we've recently talked about, we can vendor latest logs-lib and unset this for the public mixin

mongodb-atlas-mixin/config.libsonnet

    
                alertsDeadlocks: 10,  // count

                alertsSlowNetworkRequests: 10,  // count

                alertsHighDiskUsage: 90,  // %

                alertsSlowHardwareIO: 3,  // seconds

Member

Dasomeone Dec 17, 2025

Like I commented on a previous PR, we could consider having the units be more tightly coupled with the native metric unit, e.g. milliseconds in order to simplify the query

mongodb-atlas-mixin/alerts.libsonnet

    
                        {

                          alert: 'MongoDBAtlasSlowHardwareIO',

                          expr: |||

                            (sum without (disk_name) (increase(hardware_disk_metrics_read_time_milliseconds[5m])) + sum without (disk_name) (increase(hardware_disk_metrics_write_time_milliseconds[5m]))) / 1000 > %(alertsSlowHardwareIO)s

Member

Dasomeone Dec 17, 2025

Like I commented on a previous PR, we could consider having the units be more tightly coupled with the native metric unit, e.g. milliseconds in order to simplify the query

mongodb-atlas-mixin/panels.libsonnet

Member

Dasomeone Dec 17, 2025

Given the amount of panels and dashboards, could you please split this panel file into logical groups similar to the structure used by the Kafka and SNMP observability libraries

mongodb-atlas-mixin/panels.libsonnet

Comment on lines +167 to +188

    
                  hardwareIO:

                    commonlib.panels.generic.timeSeries.base.new('Hardware I/O', targets=[

                      signals.cluster.diskReadCount.asTarget(),

                      signals.cluster.diskWriteCount.asTarget(),

                    ])

                    + g.panel.timeSeries.panelOptions.withDescription("The number of read and write I/O's processed.")

                    + g.panel.timeSeries.standardOptions.withUnit('iops')

                    + g.panel.timeSeries.options.legend.withPlacement('right')

                    + g.panel.timeSeries.options.legend.withAsTable(true),

                  hardwareIOWaitTime:

                    commonlib.panels.generic.timeSeries.base.new('Hardware I/O wait time / $__interval', targets=[

                      signals.cluster.diskReadTime.asTarget()

                      + g.query.prometheus.withInterval('2m'),

                      signals.cluster.diskWriteTime.asTarget()

                      + g.query.prometheus.withInterval('2m'),

                    ])

                    + g.panel.timeSeries.panelOptions.withDescription('The amount of time spent waiting for I/O requests.')

                    + g.panel.timeSeries.standardOptions.withUnit('ms')

                    + g.panel.timeSeries.options.tooltip.withSort('desc')

                    + g.panel.timeSeries.options.legend.withPlacement('right')

                    + g.panel.timeSeries.options.legend.withAsTable(true),

Member

Dasomeone Dec 17, 2025

For these two panels, I think it'd be beneficial if we make use of the table legend options to add last*, min, mean, and max columns here. Should be available via standard options, can't remember off the top of my head, but Gabriel just used it in the postgres mixin last week

mongodb-atlas-mixin/panels.libsonnet

    
                    + g.panel.timeSeries.standardOptions.withUnit('reqps')

                    + g.panel.timeSeries.options.tooltip.withSort('desc'),

                  networkThroughput:

Member

Dasomeone Dec 17, 2025

+1 for last*, and at least mean as additional data columns for a quick overview on the side

mongodb-atlas-mixin/panels.libsonnet

Comment on lines +355 to +357

    
                  //

                  // Elections panels

                  //

Member

Dasomeone Dec 17, 2025

For these election panels, with multiple series per instance monitoring we may need to do some filtering on the 0 values, though I'm also worried about discarding good data. @schmikei @aalhour any ideas here?

mongodb-atlas-mixin/panels.libsonnet

    
                    + g.panel.timeSeries.standardOptions.withUnit('reqps')

                    + g.panel.timeSeries.options.tooltip.withSort('desc'),

                  slowNetworkRequestsPerformance:

Member

Dasomeone Dec 17, 2025

+1 panel here and general networkThroughputPerformance as well that may need some filtering as it is and will keep affecting the y axis scaling

mongodb-atlas-mixin/panels.libsonnet

    
                    + g.panel.timeSeries.options.legend.withPlacement('right')

                    + g.panel.timeSeries.options.legend.withAsTable(true),

                  hardwareIOWaitTime:

Member

Dasomeone Dec 17, 2025

+1 filtering, I'll stop commenting on them now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

monitoring-mixins